Introduction

Kwakye and Phil have been collecting weather data that dates back as far as 1921 to the present day. My task is to search for data online that could be correlated with this weather data of ours.

Our weather data contains the following:

  • variables:
    • Location (Central London, Heathrow, Bracknell, Kew, Oxford, Stanstead ),
    • Dates (from 1921 to the present day)
    • Max Temperature
    • Min Temperature
    • Sunshine
    • Precipitation
    • Frost
    • Weather Station
    • DataSource
    • Record Type Number

There are two types of possible comparisons with these datasets: Within comparisons, where we compare the data within it’s own dataset (its own variables against one another), or between comparisons, comparing datasets with one another. It’s worth doing some within comparisons first before between comparisons, so that we really understand the data we’re investigating.

Dataset 1: London Air Quality Levels

The first is a dataframe that contains information about the quality of air in London between 2008-present day, by month.

  • Website: https://data.london.gov.uk/dataset/london-average-air-quality-levels
  • source: London Datastore
  • Author: King’s College London
  • variables:
    • Date (2008 to present day)
    • Nitric Oxide (ug/m3)
    • Nitrogen Dioxide (ug/m3)
    • Oxides of Nitrogen (ug/m3)
    • Ozone (ug/m3)
    • PM10 Particulate (ug/m3)
    • PM2.5 Particulate (ug/m3)
    • Sulphur Dioxide (ug/m3)

Possible comparisons between datasets?

  • Ozone VS. Min & Max temperature
  • Ozone VS. Sunshine
  • Nitrogen Dioxide VS. Min & Max temperature
  • Nitrogen Dioxide VS. Sunshine
  • Sulphur Dioxide VS. Min & Max temperature
  • Sulphur Dioxide VS. Sunshine

In order to compare these two dataframes, it’s essential to manipulate them so that they have similar formatting. Our weather data will need to be aggregated by day and month to compare with the london.gov data that’s also by day and month.

Below is the code for the processes of this report, including:

  • Extracting Dates
  • Merging the open source data and our weather data
  • renaming
  • aggregating

and so on.

### Server connector
weatherConn <- odbcDriverConnect('driver={SQL Server};server=rlibdbwp01.twutil.net;database=weatherDB;trusted_connection=true')

#if we want to get everything from the weather data..
# sqlParam <- "SELECT  * FROM daily "
# dailyData <- sqlQuery(weatherConn, sqlParam)


### Gets smd paramaters from the server
sqlParam <- "SELECT  * FROM daily WHERE [RecordTypeNumber] = 2 AND [Location] = 'Central London' AND [Date] >= '1 Jan 2008' " #London Air Quality data only goes back as far as 2008. Phil suggested using "record type number 2".
dailyData <- sqlQuery(weatherConn, sqlParam)

#Extracting Month and Years
dailyData$Date <- as.Date(dailyData$Date)
dailyData$Month <- months(dailyData$Date)
dailyData$Year <- format(dailyData$Date,format="%y")

dailyData <- dailyData %>%
  mutate(Difference = Maxtemp - Mintemp)
#Averages of the variables 
#by month and year type, as this is how the data from gov.uk is laid out.

DailyDataMean <- aggregate(cbind(Maxtemp, Mintemp, Sunshine, Precipitation, Frost) ~ Month + Year, data = dailyData, mean)

#Q: are we interested in frost??
#importing dataframes
#roadside
Air_quality_london_roadside <- read_excel("H:/my office/Gwen Stuff/Projects/Project4 - Weather Data/WeatherDataVSOpenSourceData/Copy of air-quality-london.xls", 
                                          sheet = "Monthly_averages_roadside")
#background
Air_quality_london_background <- read_excel("H:/my office/Gwen Stuff/Projects/Project4 - Weather Data/WeatherDataVSOpenSourceData/Copy of air-quality-london.xls", 
    sheet = "Monthly_averages_background")

#renaming so it's easier to plot
setnames(Air_quality_london_roadside, old = c('Nitric Oxide (ug/m3)','Nitrogen Dioxide (ug/m3)', 'Oxides of Nitrogen (ug/m3)', 'Ozone (ug/m3)', 'PM10 Particulate (ug/m3)', 'PM2.5 Particulate (ug/m3)', 'Sulphur Dioxide (ug/m3)'), new = c('Nitric_Oxide_R','Nitrogen_Dioxide_R', 'Oxides_of_Nitrogen_R', 'Ozone_R', 'PM10_Particulate_R', 'PM2.5_Particulate_R', 'Sulphur_Dioxide_R'))
setnames(Air_quality_london_background, old = c('Nitric Oxide (ug/m3)','Nitrogen Dioxide (ug/m3)', 'Oxides of Nitrogen (ug/m3)', 'Ozone (ug/m3)', 'PM10 Particulate (ug/m3)', 'PM2.5 Particulate (ug/m3)', 'Sulphur Dioxide (ug/m3)'), new = c('Nitric_Oxide_B','Nitrogen_Dioxide_B', 'Oxides_of_Nitrogen_B', 'Ozone_B', 'PM10_Particulate_B', 'PM2.5_Particulate_B', 'Sulphur_Dioxide_B'))

#Extracting Month and Years for roadside air quality
Air_quality_london_roadside$Date <- as.Date(as.yearmon(Air_quality_london_roadside$Month)) #function from the library package "zoo"
Air_quality_london_roadside$Month <- months(Air_quality_london_roadside$Date)
Air_quality_london_roadside$Year <- format(Air_quality_london_roadside$Date,format="%y")


#Extracting Month and Years for background air quality
Air_quality_london_background$Date <- as.Date(as.yearmon(Air_quality_london_background$Month)) 
Air_quality_london_background$Month <- months(Air_quality_london_background$Date)
Air_quality_london_background$Year <- format(Air_quality_london_background$Date,format="%y")


####Merging TW Weather data and open source data together----
WeatherDataVsGovUK <- Reduce(function(x, y) merge(x, y, by=c("Month", "Year")), list(DailyDataMean, Air_quality_london_background, Air_quality_london_roadside))

Subplots

# OzoneVSTemp<- ggplot(WeatherDataVsGovUK, aes(x = Date.x, label = Date.x))+
#   geom_line(aes(y = Ozone_B, colour = "ozone Background levels (ug/m3)"))+
#   geom_line(aes(y = Ozone_R, colour = "ozone Roadside levels (ug/m3)"))+
#   geom_line(aes(y = Mintemp, colour = "min temperature (°C)")) + 
#   geom_line(aes(y = Maxtemp, colour = "max temperature (°C)")) +
#   labs(x = "Date", y = "Temp&Ozone")+
#   theme(legend.position="none")
#   ggtitle("Ozone Levels (gov.uk) VS Temp")

  
tmp1 <- scale_colour_manual("Legend", values = c(ozone_B = "red", ozone_R = "#990000", min_temp = "blue", max_temp = "purple"))
tmp2 <- scale_colour_manual("Legend", values = c(Sunshine = "#FFCC33", ozone_B = "red", ozone_R = "#990000"))
tmp3 <- scale_colour_manual("Legend", values = c(Nitrogen_Dioxide_B = "#006600", Nitrogen_Dioxide_R = "green", min_temp = "blue", max_temp = "purple"))
tmp4 <- scale_colour_manual("Legend", values = c(Sunshine = "#FFCC33", Nitrogen_Dioxide_B = "#006600", Nitrogen_Dioxide_R = "green"))
tmp5 <- scale_colour_manual("Legend", values = c(Sulphur_Dioxide_B = "#00CCCC", Sulphur_Dioxide_R = "#FF66CC", min_temp = "blue", max_temp = "purple"))
tmp6 <- scale_colour_manual("Legend", values = c(Sunshine = "#FFCC33", Sulphur_Dioxide_B = "#00CCCC", Sulphur_Dioxide_R = "#FF66CC"))

 OzoneVSTemp<- ggplot(WeatherDataVsGovUK, aes(x = Date.x, label = Date.x))+
   geom_line(aes(y = Ozone_B, colour = "ozone_B"))+
   geom_line(aes(y = Ozone_R, colour = "ozone_R"))+
   geom_line(aes(y = Mintemp, colour = "min_temp")) + 
   geom_line(aes(y = Maxtemp, colour = "max_temp")) +
   labs(x = "Date", y = "Ozone&Temp")+
   ggtitle("Ozone levels (gov.uk) VS Sunshine (TW)")+
   tmp1
  
  
  

OzoneVSSunshine <- ggplot(WeatherDataVsGovUK, aes(x = Date.x, label = Date.x))+
  geom_line(aes(y = Ozone_B, colour = "ozone_B"))+
  geom_line(aes(y = Ozone_R, colour = "ozone_R"))+
  geom_line(aes(y = Sunshine, colour = "Sunshine")) + 
  labs(x = "Date", y = "Ozone&Sunshine")+
  ggtitle("Ozone Levels (gov.uk) VS Sunshine (TW)")+
  tmp2
  


NDVSTemp<- ggplot(WeatherDataVsGovUK, aes(x = Date.x, label = Date.x))+
  geom_line(aes(y = Nitrogen_Dioxide_B, colour = "Nitrogen_Dioxide_B"))+
  geom_line(aes(y = Nitrogen_Dioxide_R, colour = "Nitrogen_Dioxide_R"))+
  geom_line(aes(y = Mintemp, colour = "min_temp")) + 
  geom_line(aes(y = Maxtemp, colour = "max_temp")) +
  labs(x = "Date", y = "Nitrogen Dioxide&Temp")+
  ggtitle("Nitrogen Dioxide Levels (gov.uk) VS Temperature (TW)")+
  tmp3


NDVSSunshine<- ggplot(WeatherDataVsGovUK, aes(x = Date.x, label = Date.x))+
  geom_line(aes(y = Nitrogen_Dioxide_B, colour = "Nitrogen_Dioxide_B"))+
  geom_line(aes(y = Nitrogen_Dioxide_R, colour = "Nitrogen_Dioxide_R"))+
  geom_line(aes(y = Sunshine, colour = "Sunshine")) +
  labs(x = "Date", y = "Nitrogen Dioxide Sunshine")+
  ggtitle("Nitrogen Dioxide Levels (gov.uk) VS Sunshine (TW)")+
  tmp4


  SDVSTemp<- ggplot(WeatherDataVsGovUK, aes(x = Date.x, label = Date.x))+
    geom_line(aes(y = Sulphur_Dioxide_B, colour = "Sulphur_Dioxide_B"))+
    geom_line(aes(y = Sulphur_Dioxide_R, colour = "Sulphur_Dioxide_R"))+
    geom_line(aes(y = Mintemp, colour = "min_temp")) + 
    geom_line(aes(y = Maxtemp, colour = "max_temp")) +
    labs(x = "Date", y = "Sulhur Dioxide&Temp")+
    ggtitle("Sulphur Dioxide Levels (gov.uk) VS Temperature (TW)")+
    tmp5

SDVSSunshine<- ggplot(WeatherDataVsGovUK, aes(x = Date.x, label = Date.x))+
  geom_line(aes(y = Sulphur_Dioxide_B, colour = "Sulphur_Dioxide_B"))+
  geom_line(aes(y = Sulphur_Dioxide_R, colour = "Sulphur_Dioxide_R"))+
  geom_line(aes(y = Sunshine, colour = "Sunshine")) + 
labs(x = "Date", y = "Sulphur Dioxide&Sunshine")+
  ggtitle("Sulphur Dioxide Levels (gov.uk) VS Sunshine (TW)")+
  tmp6


p <- ggplotly(subplot(OzoneVSTemp, OzoneVSSunshine, NDVSTemp, NDVSSunshine, SDVSTemp, SDVSSunshine, nrows = 3, margin = 0.05, titleY = TRUE)) %>%
  layout(showlegend = FALSE, title = "Air Quality Data (gov.uk) against temperature & Sunshine (TW)")
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
ggplotly(p)

Insight & Graph Descriptions

Here are some line graphs of the variables we’re most likely interested in between our weather data against London.gov.uk’s data on air quality. We see these variables ploted against time: how they all relate as time passes.

The x-axis is date, from 2008-2018, as the air quality data only goes as far back as 2008. The data is monthly, meaning that each point that we hover on is the average for that month. For example, if we hover over the top right hand plot (Ozone levels VS. Temperature), over the minimum line and see the first point for temperature (01/01/2008) is 6.19 degrees celsius, what we mean is that the average temperature across all the minimum temperatures for each day in January 2008 is 6.19 degrees as a mean estimate. Likewise, the average temperature across all the maximum temperatures in January 2008 is 11.08 degrees.

Another point to make: because we are plotting all our measurements monthly, if we refer to “anywhere between month a and month b, we only refer to singular monthly points between them. I.e between January and March would be the 3 points: the average of January, average of February and the average for March.

The y-axis is “value”, with no units. This is so we can plot different variables with different units on the same plots indiscriminately. The label of the y-axis is simply the two fields we’re plotting: 1 from london.gov.uk, the other from our weather data. For each field from the air quality data come 2 variables: data from the roadside and data from the background.

For example,

  • Nitrogen_Dioxide_B is the variable representing Nitrogen Dioxide levels in the background
  • Nitrogen_Dioxide_R represents Nitrogen Dioxide levels by the roadside.

There are also variations of Temperature from our weather data:

  • min_temp (minimum temperature/day)
  • max_temp (maximum temperature/day)

Legends of the plots explained:

London.gov.uk Quality Data Variables:

  • — Ozone_B
  • — Ozone_R
  • — Nitrogen_Dioxide_B
  • — Nitrogen_Dioxide_R
  • — Sulphur_Dioxide_B
  • — Sulphur_Dioxide_R

Thames Water Weather Data Variables:

  • — max_temp
  • — min_temp
  • — Sunshine

For all these variables, they peak & reach their lowest points only once (occasionally twice) a year.

Close Up Plots

Ozone Vs. Temperature

ggplotly(OzoneVSTemp)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Ozone and temperature rise and fall similarly between seasons. When ozone is rising, temperature is rising and when ozone is falling, so is temperature. It seems that they are correlated to one another, and that one is influencing the other. This is what we predict for our data, as outside research suggests they do indeed have a relationship.

What I noticed here though is that there is a short delay in the peak of temperature compared with the ozone peak. When ozone is at it’s peak, its not until a few months later until temperature reaches its peak. There’s also a short delay in the lowest temperature point compared with the lowest levels of ozone. These delays are between 1-3 months.

  • Ozone levels consistently peak at the same time each year:May/occassionally April.
  • Temperature also peaks at the same time each year: July/occassionally August

  • Ozone levels reach their lowest point anywhere between October-February. Lowest points after December tend to be in the earlier years of the timeline. Lowest Ozone levels before December, like in October or November, tend to be later on in the timeline.
  • Temperature is at its lowest between December-February.

Statements from our data
  1. Background Ozone levels are higher than roadside levels, which is not what we’d expect? Surely we would expect ozone to be higher by the road, next to traffic? Perhaps this is a mistake in the data on their grounds?
  2. Maximum temperature per day is (obviously) higher than the minimum temperature per day, but it’s worth noting that the rise and fall in temperature overtime shows similar patters between the min and max tempature each day. In other words, if the maximum temperature falls, so does the minimum temperature and vice versa. But we can’t assume a correlation yet without further investigation.
  3. Ozone road levels and ozone background levels also rise & fall together. When 1 falls/rises, so does the other.
  4. Min. & max. temperature peak together, and the different ozone levels peak together (per season/year)! When background Ozone levels are at their highest, so are roadside ozone levels. When the maximum temperature is at its highest, so is the minimum temperature. On the very same day.

Ozone Levels Vs. Sunshine

ggplotly(OzoneVSSunshine)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Now we look into sunshine against ozone. Sunshine also rises and falls with ozone levels throughout the year, consistently from 2008-2018. It also has a delay in peak compared with ozone, as did temperature. Maybe because temperature delayed from the previous plot is why sunshine did, because temperature and sunshine are related? We still can’t be too sure! But this is my guess: that a peak & drop delay in sunshine compared with ozone’s levels may be related to temperature’s delaying pattern.

  • Ozone levels consistently peak at the same time each year:May/occassionally April.
  • Sunshine usually peaks in June/July, but not always the case. Between 2008-18, it peaks anywhere between April and August.

  • Ozone levels reach their lowest point anywhere between October-February. Lowest points after December tend to be in the earlier years of the timeline. Lowest Ozone levels before December, like in October or November, tend to be later on in the timeline.
  • Sunshine is at it’s lowest anywhere between November-January.

Nitrogen Dioxide Levels Vs. Temperature

ggplotly(NDVSTemp)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Next we introduce Nitrogen Dioxide, first plotted against temperature.

Nitrogen Dioxide follows a similar fluctuating pattern just like ozone, temperature and sunshine that we’ve seen previously, fluctuating between seasons and years. However, there is a noticeable difference: When temperature peaks, Nitrogen Dioxide levels are at their lowest. Nitrogen Dioxide patterns are almost the complete opposite to what was seen with Ozone.

Furthermore, there doesn’t seem to be a consistent delay of peak or drop for either temperature of Nitrogen dioxide. Some years, temperature will peak sooner than Nitrogen Dioxide reaches its lowest point (e.g. 2008). Other years, Nitrogen Dioxide reaches its lowest point before temperature peaks (e.g. 2012). This may be explained by the more eratic behaviour of the changes in Nitrogen Dioxide (more frequent fluctuations) compared with ozone levels.

  • Nitrogen Dioxide levels peak any time between October-March.
  • Temperature peaks at the same time each year: July/occassionally August

  • Nitrogen Dioxide levels reaches their lowest point between June-August. (summer)
  • Temperature is at its lowest between December-February. (winter)

Another thing I notice from this plot is that the difference between roadside & background levels of Nitrogen Dioxide is greater than the difference between roadside & background levels of Ozone.

Nitrogen Dioxide Levels Vs.Sunshine

ggplotly(NDVSSunshine)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Not much more can be said about this plot than the previous one. Sunshine levels peak approximately when Nitrogen Dioxide levels are at their lowest.

Sulphur Dioxide Levels Vs. Temperature

ggplotly(SDVSTemp)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Sulhur Dioxide fluctuates much more abruptly than the other variables that we’ve seen: not such obious dips and peaks as the others. It’s not very clear, but sulphur dioxide levels tends to peak when temperature is at its lowest and vice versa, like Nitrogen Dioxide.Lowest sulphur dioxide 0.04 (2dp) on 10/2015. Highest at 7.61 on 01/2017 where the value is the same figure for temperature.. the lines meet.

Sulphur Dioxide Levels Vs. Sunshine

ggplotly(SDVSSunshine)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Due to the closeup on the scaling, it’s even more obvious the ferocity of the Sulphur Dioxide fluctuations, when we see them plotted with sunshine. Sulphur Dioxide levels do peak when sunshine is at its lowest, and vice versa.

Correlations

We can also do correlation plots for this data! I have decided to do Spearman correlations for this data, as I am interested in the monotonicity of the data between the two variables than the linearity.

Thames Water Weather Data VS. Air Quality Data

drops1 <- c("Month","Year", "Date.x", "Date.y")
corr1 <- cor(WeatherDataVsGovUK[ , !(names(WeatherDataVsGovUK) %in% drops1)], use="pairwise.complete.obs", method=c("spearman"))



p<-ggcorrplot(corr1, hc.order = TRUE, type = "lower",
              outline.col = "black", 
              ggtheme = ggplot2::theme_gray,
              colors = c("#6D9EC1", "white", "#E46726"))+
  ggtitle("Correlation of variables between Air Quality Data and TW Weather Data in London, 2008-2018")
ggplotly(p)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Here, all of the variables from our two datasets are correlated against one another. Red colours imply a positive correlation between two variables, whereas blue colours imply a negative correlation. The stronger the colour, the stronger the correlation. The strength in colour to indicate colour is universal practice among most correlation plots of this type/this package.

The plot is of type “lower”, which means that the plot excludes repitions for the same correlation and misses out correlations for the same two variables.

The first thing that catches my eye is the whole block of blue, for temperature and sunshine are negatively correlated with almost every field from the Air Quality data apart from ozone.

From our Weather data, ozone is most correlated with sunshine (0.71), indicating higher ozone levels on sunnier days.

Oxides of Nitrogen seem to be most strongly (negatively) correlated with our weather data.. whereas there’s not much of a correlation with sulphur dioxide. Precipitation (rain..) has hardly any relationship with our London’s air quality data, which is surprising…

Thames Water Weather Data VS. Air Quality Data (ROADSIDE ONLY)

drop_Roadside <- c('Month','Year', 'Date.x', 'Date.y', 'Nitric_Oxide_R','Nitrogen_Dioxide_R', 'Oxides_of_Nitrogen_R', 'Ozone_R', 'PM10_Particulate_R', 'PM2.5_Particulate_R', 'Sulphur_Dioxide_R')
corr_Roadside <- cor(WeatherDataVsGovUK[ , !(names(WeatherDataVsGovUK) %in% drop_Roadside)], use="pairwise.complete.obs", method=c("spearman"))

r<-ggcorrplot(corr_Roadside, hc.order = TRUE, type = "lower",
              outline.col = "white",
              ggtheme = ggplot2::theme_gray,
              colors = c("#6D9EC1", "white", "#E46726"))+
  ggtitle("Correlation of variables between Air Quality Data (Roadside) and TW Weather Data in London, 2008-2018")
ggplotly(r)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Thames Water Weather Data VS. Air Quality Data (BACKGROUND ONLY)

drop_Background <- c('Month','Year', 'Date.x', 'Date.y', 'Nitric_Oxide_B','Nitrogen_Dioxide_B', 'Oxides_of_Nitrogen_B', 'Ozone_B', 'PM10_Particulate_B', 'PM2.5_Particulate_B', 'Sulphur_Dioxide_B')
corr_Background <- cor(WeatherDataVsGovUK[ , !(names(WeatherDataVsGovUK) %in% drop_Background)], use="pairwise.complete.obs", method=c("spearman"))

b<-ggcorrplot(corr_Background, hc.order = TRUE, type = "lower",
              outline.col = "white",
              ggtheme = ggplot2::theme_gray,
              colors = c("#6D9EC1", "white", "#E46726"))+
  ggtitle("Correlation of variables between Air Quality Data (Background) and TW Weather Data in London, 2008-2018")
ggplotly(b)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

not much of a difference whether the air quality data was collected from the background of London or at the roadside… similar trends are apparent between our weather data variables.

Linear Models

OzoneB_MaxTemp <- ggplot(WeatherDataVsGovUK, aes(x = Maxtemp, y=Ozone_B))+
  geom_point()+
  geom_smooth(method='lm', method.args = list(family = "symmetric"), formula=y~x)+
  labs(x = "Maxtemp", y = "Ozone_B")+
  theme(legend.position="none")
  

OzoneB_MinTemp <- ggplot(WeatherDataVsGovUK, aes(x = Mintemp, y=Ozone_B))+
  geom_point()+
  geom_smooth(method='lm', method.args = list(family = "symmetric"), formula=y~x)+
  labs(x = "Mintemp", y = "Ozone_B")+
  theme(legend.position="none")

OzoneR_MaxTemp <- ggplot(WeatherDataVsGovUK, aes(x = Maxtemp, y=Ozone_R))+
  geom_point()+
  geom_smooth(method='lm', method.args = list(family = "symmetric"), formula=y~x)+
  labs(x = "Maxtemp", y = "Ozone_R")+
  theme(legend.position="none")

OzoneR_MinTemp <- ggplot(WeatherDataVsGovUK, aes(x = Mintemp, y=Ozone_R))+
  geom_point()+
  geom_smooth(method='lm', method.args = list(family = "symmetric"), formula=y~x)+
  labs(x = "Mintemp", y = "Ozone_R")+
  theme(legend.position="none")

p <- ggplotly(subplot(OzoneB_MaxTemp, OzoneB_MinTemp, OzoneR_MaxTemp, OzoneR_MinTemp, nrows = 2, margin = 0.05, titleX = TRUE, titleY = TRUE)) %>%
  layout(showlegend = FALSE, title = "Solar Levels Against Thames Water Data") 
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
ggplotly(p)

Insight & Graph Descriptions

These points are the same ones used for the correlations and for the first linegraphs of the document. It would have been better to use daily points for them, but the Air Quality Dataset is only monthly (and hourly..), so this isn’t possible.

Proving our earlier point, there isn’t much of a difference between background and roadside levels when it comes to relationships with our weather data. Ozone and temperature show a positive linear relationship, which is stronger looking at the maximum temperature than the minimum temperature of point. All plots show large levels of spread, though confidence intervals around the regression line are all relatively narrow.

corresponding statistics

library(sjPlot)
## Warning: package 'sjPlot' was built under R version 3.4.4
## Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
## TMB was built with Matrix version 1.2.14
## Current Matrix version is 1.2.10
## Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
fit1 <- lm(Ozone_B ~ Maxtemp, data=WeatherDataVsGovUK)
fit2 <- lm(Ozone_B ~ Mintemp, data=WeatherDataVsGovUK)
fit3 <- lm(Ozone_B ~ Maxtemp + Mintemp, data=WeatherDataVsGovUK)
fit4 <- lm(Ozone_R ~ Maxtemp, data=WeatherDataVsGovUK)
fit5 <- lm(Ozone_R ~ Mintemp, data=WeatherDataVsGovUK)
fit6 <- lm(Ozone_R ~ Maxtemp + Mintemp, data=WeatherDataVsGovUK)

sjt.lm(fit1,fit2, fit3, file="output.html")# You have to save the table in html format.
## Fitted models have different coefficients. Grouping may not work properly. Set `group.pred = FALSE` if you encouter cluttered labelling.
    Ozone_B   Ozone_B   Ozone_B
    B CI p   B CI p   B CI p
(Intercept)   20.75 15.61 – 25.89 <.001   28.23 24.22 – 32.24 <.001   5.21 -1.39 – 11.81 .121
Maxtemp   1.01 0.70 – 1.32 <.001       4.47 3.35 – 5.58 <.001
Mintemp       0.99 0.56 – 1.42 <.001   -4.58 -6.01 – -3.15 <.001
Observations   123   123   123
R2 / adj. R2   .253 / .247   .146 / .139   .440 / .431
sjt.lm(fit4,fit5, fit6, file="output.html")
## Fitted models have different coefficients. Grouping may not work properly. Set `group.pred = FALSE` if you encouter cluttered labelling.
    Ozone_R   Ozone_R   Ozone_R
    B CI p   B CI p   B CI p
(Intercept)   17.25 13.16 – 21.34 <.001   22.04 18.92 – 25.16 <.001   6.77 1.29 – 12.26 .016
Maxtemp   0.63 0.38 – 0.88 <.001       2.96 2.04 – 3.89 <.001
Mintemp       0.61 0.27 – 0.94 <.001   -3.09 -4.28 – -1.90 <.001
Observations   123   123   123
R2 / adj. R2   .173 / .167   .096 / .088   .322 / .311

Insight

These are quite interesting. Among the minimum temperature points, for 1 degree rise in temperature, background ozone levels go up by 0.99 (less than y=x), but among the maximum temperature of points, for each 1 degree rise in temperature, background ozone levels go up by 1.01 (more than y=x). Generally, then, as temperature rises by 1 degree, ozone levels increase by approximately 1 degree (between 0.99 and 1.01). However, differentiating intercepts between the maximum temperature regression line & the minimum temperature regression line influence this difference.

Conclusion-TW data against Ozone levels

Looking at these results from both the London roadside data and the background data for ozone levels, we can reject the hypothesis of no linear relationship between temperature levels and ozone. There is signifiant evidence to suggest that the variation in ozone influenced by temperature is not due to chance.(less than 1% probability due to chance)

Dataset 2: Solar Panel Energy Generation Data

The 2nd set of data is a dataframe that contains information about the energy generated in varies parts of London & the south west between 27th July 2013 - 19th November 2014.It’s another dataframe that has come from the London Datastore site.

Possible comparisons?

  • Solar Energy VS. Min & Max temperature
  • Solar Energy VS. Sunshine
  • Solar Energy VS. Precipitation

Code:

### Server connector
weatherConn <- odbcDriverConnect('driver={SQL Server};server=rlibdbwp01.twutil.net;database=weatherDB;trusted_connection=true')

#if we want to get everything from the weather data..
# sqlParam <- "SELECT  * FROM daily "
# dailyData <- sqlQuery(weatherConn, sqlParam)


### Gets smd paramaters from the server
sqlParam <- "SELECT  * FROM daily WHERE [RecordTypeNumber] = 2 AND [Location] = 'Central London' AND ([Date] BETWEEN '25 Nov 2013' AND '30 Nov 2014')" #London Air Quality data only goes back as far as 2008. Phil suggested using "record type number 2".
dailyData <- sqlQuery(weatherConn, sqlParam)

#Extracting Month and Years
dailyData <- dailyData %>%
  mutate(Date = as.Date(dailyData$Date), Month = months(dailyData$Date), Year = format(dailyData$Date, format="%y"))

#Averages of the variables 

DailyDataMean <- aggregate(cbind(Maxtemp, Mintemp, Sunshine, Precipitation, Frost) ~ Month + Year, data = dailyData, mean)
#by month and year type, as this is how the data from gov.uk is laid out.

#Loading Solar Data----

Solar_Data <- read_excel("H:/my office/Gwen Stuff/Projects/Project4 - Weather Data/WeatherDataVSOpenSourceData/Copy of Weather Data 2014-11-30.xlsx", 

                                                                                                                   sheet = "by_Date_R")
Solar_Data$Date <- as.Date(Solar_Data$Date)

Solar_Data <- setnames(Solar_Data, old = c('Easthill Road', 'Elm Crescent', 'Forest Road', 'Maple Drive East', 'Grand Total'), new = c('Easthill_Road', 'Elm_Crescent', 'Forest_Road', 'Maple_Drive_East', 'Grand_Total_Solar'))

#Plots----

WeatherDataVsGovUK <- dailyData %>%
  inner_join(Solar_Data)
## Joining, by = "Date"

Here are some plots of these variables (Solar from gov.uk, rest from TW) plotted against date (time series data)

Subplots

library(methods) 
Solar<- ggplot(WeatherDataVsGovUK, aes(x = Date, label = Date))+
  geom_line(aes(y = Grand_Total_Solar, colour = "Total Solar Levels Across Regions"))+
  scale_color_manual(values=c("#FF6600"))+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

  
Temp <- ggplot(WeatherDataVsGovUK, aes(x = Date, label = Date))+
  geom_line(aes(y = Mintemp, colour = "Min temperature (°C)")) + 
  geom_line(aes(y = Maxtemp, colour = "Max temperature (°C)")) +
  scale_color_manual(values=c("purple", "blue"))+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))


Sunshine <- ggplot(WeatherDataVsGovUK, aes(x = Date, label = Date))+
  geom_line(aes(y = Sunshine, colour = "Sunshine"))+
  scale_color_manual(values=c("#FFCC33"))+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Rain <- ggplot(WeatherDataVsGovUK, aes(x = Date, label = Date))+
  geom_line(aes(y = Precipitation, colour = "Rain"))+
  scale_color_manual(values=c("#6699CC"))+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

p <- ggplotly(subplot(Solar, Temp, Sunshine, Rain, nrows = 2, margin = 0.05))%>%
  layout(title = "TW Weather data VS. London.gov.uk Solar Data")
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
ggplotly(p)

Graph Descriptions

Here are some line graphs of the variables we’re most likely interested in between our weather data against London.gov.uk’s data on Solar energy generation levels. We see these variables ploted against time: how they all relate as time passes.

The x-axis is date, from 25/11/2013-30/11/2014. This time round we have daily data from our online source, so we can look at our data more closely and more accurately as we don’t need to do average levels for our weather data.

The y-axis is still “value” with no units. The difference this time is that each variable is in a different plot, rather than overlaying eachother like in the first subplot of this document. This is because as the data is fluctuating more frequently from the measurements being taken daily, it would be too messy to put multiple variables in the same plot. So, we can simply compare each of them between the subplots.

Legends of the plots explained:

London.gov.uk Quality Data Variables:

  • — Total Solar Levels (across all regions)

Thames Water Weather Data Variables:

  • — max_temp
  • — min_temp
  • — Sunshine

For all these variables, they peak & reach their lowest points only once (occasionally twice) a year.

Insight

Solar and sunshine levels follow similar trends across the year. When levels of solar energy dip down in the middle of the year at 3 points, sunshine levels dip on exactly the same day. These days are:

  • 27/05/2014
  • 10/07/2014
  • 25/08/2014

As sunshine dips on exactly the same days, it definitely shows a sign of comparative ability between the datasets.

On the other hand, sunshine levels fluctuate and vary much more than solar energy levels do. Sunshine levels can vary massively from one day to the next, but less of that is seen with solar energy levels. Maybe it’s worth asking:

What is the probability that the difference in solar from one day to the next is greater than X? What is the probability that the difference in sunshine levels from one day to the next is greater than X?

The difference between Solar levels and temperature is more interesting. There seems to be a delay in the highest temperatures of the year (June-August) compared with the highest solar energy generation levels of the year (April-June).

Correlation

corr1 <- cor(WeatherDataVsGovUK[c("Maxtemp", "Mintemp", "Sunshine", "Precipitation", "Grand_Total_Solar")],use="pairwise.complete.obs", method=c("spearman"))

p<-ggcorrplot(corr1, hc.order = TRUE, type = "lower",
              outline.col = "black", 
              ggtheme = ggplot2::theme_gray,
              colors = c("#6D9EC1", "white", "#E46726"))+
  ggtitle("Solar Data VS Weather Data")
ggplotly(p)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Insight

A relatively strong positive correlation between maximum temperature and solar & sunshine and solar. Perhaps surprisingly, there is a stronger correlation between max. temperature and solar compared with sunshine and solar, when using the Spearman’s correlation method. Meaning, solar levels rise more when temperature rises than when sunshine levels rise.

Solar levels have a negative correlation with precipitaition, which is to be expected.

Linear Models

Here are plots of the data from TW’s weather data plotted against Solar levels (gov.uk) and their corresponding statistics.

Seeing these fits in action….

SolarVSMaxtemp <- ggplot(WeatherDataVsGovUK, aes(x = Maxtemp, y=Grand_Total_Solar))+
  geom_point()+
  geom_smooth(method='lm', method.args = list(family = "symmetric"), formula=y~x)

SolarVSMintemp <- ggplot(WeatherDataVsGovUK, aes(x = Mintemp, y=Grand_Total_Solar))+
  geom_point()+
  geom_smooth(method='lm', method.args = list(family = "symmetric"), formula=y~x)

SolarVSunshine <- ggplot(WeatherDataVsGovUK, aes(x = Sunshine, y=Grand_Total_Solar))+
  geom_point()+
  geom_smooth(method='lm', method.args = list(family = "symmetric"), formula=y~x)

SolarVSRain <- ggplot(WeatherDataVsGovUK, aes(x = Precipitation, y=Grand_Total_Solar))+
  geom_point()+
  geom_smooth(method='lm', method.args = list(family = "symmetric"), formula=y~x)

p <- ggplotly(subplot(SolarVSMaxtemp, SolarVSMintemp, SolarVSunshine, SolarVSRain, nrows = 2, margin = 0.05, titleX = TRUE)) %>%
  layout(showlegend = FALSE, title = "Solar Levels Against Thames Water Data") 
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
ggplotly(p)

Here we learn why the correlation coefficient for the correlation between minimum temperature and Solar levels isn’t as great as the correlation coefficient for maximum temperature and solar levels… Minimum temperature behavious is more eratic and rising away from, not along with, solar levels.

The scatter plot between precipitation and levels of solar energy is very peculiar. The confidence interval becomes very wide after the 15.47 point, as the outliers behave differently from the rest of the data.

Statistics

library(sjPlot)
fit1 <- lm(Grand_Total_Solar ~ Maxtemp, data=WeatherDataVsGovUK)
fit2 <- lm(Grand_Total_Solar ~ Mintemp, data=WeatherDataVsGovUK)
fit3 <- lm(Grand_Total_Solar ~ Maxtemp + Mintemp, data=WeatherDataVsGovUK)
fit4 <- lm(Grand_Total_Solar ~ Sunshine, data=WeatherDataVsGovUK)
fit5<- lm(Grand_Total_Solar ~ Precipitation, data=WeatherDataVsGovUK)


sjt.lm(fit1,fit2, fit3, file="output.html")
## Fitted models have different coefficients. Grouping may not work properly. Set `group.pred = FALSE` if you encouter cluttered labelling.
    Grand_Total_Solar   Grand_Total_Solar   Grand_Total_Solar
    B CI p   B CI p   B CI p
(Intercept)   -364.43 -521.14 – -207.73 <.001   235.88 105.88 – 365.88 <.001   -514.57 -674.97 – -354.17 <.001
Maxtemp   69.78 60.80 – 78.76 <.001       112.23 94.62 – 129.83 <.001
Mintemp       61.53 48.57 – 74.50 <.001   -61.21 -83.32 – -39.11 <.001
Observations   371   371   371
R2 / adj. R2   .387 / .386   .191 / .189   .433 / .430
sjt.lm(fit4,fit5, file="output.html")
## Fitted models have different coefficients. Grouping may not work properly. Set `group.pred = FALSE` if you encouter cluttered labelling.
    Grand_Total_Solar   Grand_Total_Solar
    B CI p   B CI p
(Intercept)   217.31 125.63 – 308.99 <.001   856.27 786.62 – 925.92 <.001
Sunshine   113.90 98.64 – 129.16 <.001    
Precipitation       -28.01 -42.41 – -13.61 <.001
Observations   371   371
R2 / adj. R2   .369 / .367   .038 / .036

Extra insight

### Server connector
weatherConn <- odbcDriverConnect('driver={SQL Server};server=rlibdbwp01.twutil.net;database=weatherDB;trusted_connection=true')

#if we want to get everything from the weather data..
# sqlParam <- "SELECT  * FROM daily "
# dailyData <- sqlQuery(weatherConn, sqlParam)


### Gets smd paramaters from the server
sqlParam <- "SELECT  * FROM daily WHERE [RecordTypeNumber] = 2 AND [Location] = 'Central London' AND [Date] >= '1 Jan 2008' " #London Air Quality data only goes back as far as 2008. Phil suggested using "record type number 2".
dailyData <- sqlQuery(weatherConn, sqlParam)

#Extracting Month and Years
dailyData$Date <- as.Date(dailyData$Date)
dailyData$Month <- months(dailyData$Date)
dailyData$Year <- format(dailyData$Date,format="%y")

dailyData <- dailyData %>%
  mutate(Difference = Maxtemp - Mintemp)
#Averages of the variables 
#by month and year type, as this is how the data from gov.uk is laid out.

DailyDataMean <- aggregate(cbind(Maxtemp, Mintemp, Sunshine, Precipitation, Frost) ~ Month + Year, data = dailyData, mean)

#Q: are we interested in frost??

Comparing Minimum & Maximum Temperature from TW database

insight1 <- ggplot(dailyData, aes(x=Month, y=Difference))+
  geom_boxplot()+
  stat_summary(fun.y=mean, geom="line", aes(group=1))+ 
stat_summary(fun.y=mean, geom="point")
ggplotly(insight1)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Here is a boxplot showing the range of differences between the Maximum temperature and the minimum temperatures of each day collected between 2008 and 2018 for each month of the year.

Forecasting - PROPHET

Leo has introduced us to the package prophet: a forecasting procedure created by Facebook Open Source for time series data. It can be used in R and python.

More on Python can be found here: https://facebook.github.io/prophet/

Lets see how it can be implemented in our data.

### Server connector
weatherConn <- odbcDriverConnect('driver={SQL Server};server=rlibdbwp01.twutil.net;database=weatherDB;trusted_connection=true')

#if we want to get everything from the weather data..
# sqlParam <- "SELECT  * FROM daily "
# dailyData <- sqlQuery(weatherConn, sqlParam)


### Gets smd paramaters from the server
sqlParam <- "SELECT  * FROM daily WHERE [RecordTypeNumber] = 2 AND [Location] = 'Central London' AND [Date] >= '1 Jan 2008' " #London Air Quality data only goes back as far as 2008. Phil suggested using "record type number 2".
dailyData <- sqlQuery(weatherConn, sqlParam)

#Extracting Month and Years
dailyData$Date <- as.Date(dailyData$Date)
dailyData$Month <- months(dailyData$Date)
dailyData$Year <- format(dailyData$Date,format="%y")

dailyData <- dailyData %>%
  mutate(Difference = Maxtemp - Mintemp)
#Averages of the variables 
#by month and year type, as this is how the data from gov.uk is laid out.

DailyDataMean <- aggregate(cbind(Maxtemp, Mintemp, Sunshine, Precipitation, Frost) ~ Month + Year, data = dailyData, mean)


#Max temp data
MaxTempData <- dailyData %>% 
  dplyr::select(Date, Maxtemp) %>%
   plyr::rename(c('Maxtemp'='y', 'Date' = 'ds'))
MaxTempProphet <- prophet(MaxTempData)
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Initial log joint probability = -155.325
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance
MaxTempFuture <- make_future_dataframe(MaxTempProphet, periods = 365)
MaxTempForecast <- predict(MaxTempProphet, MaxTempFuture)


#min temp
MinTempData <- dailyData %>% 
  dplyr::select(Date, Mintemp) %>%
  plyr::rename(c('Mintemp'='y', 'Date' = 'ds'))

MinTempProphet <- prophet(MinTempData)
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Initial log joint probability = -225.712
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance
MinTempFuture <- make_future_dataframe(MinTempProphet, periods = 365)
MinTempForecast <- predict(MinTempProphet, MinTempFuture)

#The difference!

DifferenceMinMaxData <- dailyData %>% 
  dplyr::select(Date, Difference) %>%
  plyr::rename(c('Difference'='y', 'Date' = 'ds'))
DifferenceMinMaxProphet <- prophet(DifferenceMinMaxData)
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Initial log joint probability = -99.0907
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance
DifferenceMinMaxFuture <- make_future_dataframe(DifferenceMinMaxProphet, periods = 365)
DifferenceMinMaxForecast <- predict(DifferenceMinMaxProphet, DifferenceMinMaxFuture)


#Q: are we interested in frost??

Forecasting Maximum Temperature overtime

#max temp
ggplotly(plot(MaxTempProphet, MaxTempForecast))
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
prophet_plot_components(MaxTempProphet, MaxTempForecast, weekly_start = 1)

Forecasting Minimum Temperature Overtime

#mintemp
ggplotly(plot(MinTempProphet, MinTempForecast))
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
prophet_plot_components(MinTempProphet, MinTempForecast, weekly_start = 1)

Forecasting the Difference Overtime

ggplotly(plot(DifferenceMinMaxProphet, DifferenceMinMaxForecast))
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
prophet_plot_components(DifferenceMinMaxProphet, DifferenceMinMaxForecast, weekly_start = 1)

What’s really interesting is that the difference between the minimum and maximum temperatures per day follow a really similar pattern to the normal trends! So, in general, when temperature reaches its peak per year (max temperature and min temperature will be reaching their peaks), then the difference between the two reach their peaks also!

But, this is assuming we know nothing about anything else that could be influencing these trends. What are the main reasons that minimum and maximum temperatures follow in this way? A way to do this is by adding regressors to the forecast, so that we see another trend with the regressors that could be having an impact.

Lets do this now for the Max temperature:

Forecasting Max Temperature overtime With Extra Regressors

### Server connector
weatherConn <- odbcDriverConnect('driver={SQL Server};server=rlibdbwp01.twutil.net;database=weatherDB;trusted_connection=true')

#if we want to get everything from the weather data..
# sqlParam <- "SELECT  * FROM daily "
# dailyData <- sqlQuery(weatherConn, sqlParam)


### Gets smd paramaters from the server
sqlParam <- "SELECT  * FROM daily WHERE [RecordTypeNumber] = 2 AND [Location] = 'Central London' AND [Date] >= '1 Jan 2008' " #London Air Quality data only goes back as far as 2008. Phil suggested using "record type number 2".
dailyData <- sqlQuery(weatherConn, sqlParam)

#Extracting Month and Years
dailyData$Date <- as.Date(dailyData$Date)
dailyData$Month <- months(dailyData$Date)
dailyData$Year <- format(dailyData$Date,format="%y")

dailyData <- dailyData %>%
  mutate(Difference = Maxtemp - Mintemp)
#Averages of the variables 
#by month and year type, as this is how the data from gov.uk is laid out.

DailyDataMean <- aggregate(cbind(Maxtemp, Mintemp, Sunshine, Precipitation, Frost) ~ Month + Year, data = dailyData, mean)

dailyData <- dailyData %>%
  plyr::rename(c('Maxtemp'='y', 'Date' = 'ds')) %>% 
  na.omit()


MaxTempProphet2 <- prophet()
MaxTempProphet2 <- add_regressor(MaxTempProphet2, 'Sunshine')
MaxTempProphet2 <- fit.prophet(MaxTempProphet2, dailyData)
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Initial log joint probability = -155.273
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance
MaxTempProphet2$changepoints <- as.Date(MaxTempProphet2$changepoints)

MaxTempFuture2 <- make_future_dataframe(MaxTempProphet2, periods = 365)
MaxTempFuture2$ds <- as.Date(MaxTempFuture2$ds)
 MaxTempFuture2 <- MaxTempFuture2 %>%
   left_join(dailyData) %>%
   na.omit()
## Joining, by = "ds"
MaxTempForecast2 <- predict(MaxTempProphet2, MaxTempFuture2)
ggplotly(plot(MaxTempProphet2, MaxTempForecast2))
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
prophet_plot_components(MaxTempProphet2, MaxTempForecast2, weekly_start = 1, plot_cap = FALSE)

This time we can see how sunshine changes throughout time and the affect that has on temperature. During the earlier months of the year (such as December and January), the extra_regressors plot dips down significantly, which means that the levels of sunshine during those months have a negative impact on temperature compared with if there was no sunshine affect at all. In contrast, during the warmer months, the levels oh sunshine have a positive impact on temperature and at most have an effect of 3 degrees extra on temperature.

The line plots of trends, weekly and yearly don’t change that much whether or not we consider sunshine as an additional regressor.